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Landauer's Principle states that the work cost of erasure of one bit of information has a funda- 
mental lower bound of fcTln(2). Here we prove a quantitative Landauer's principle for arbitrary 
processes, providing a general lower bound on their work cost. This bound is given by the min- 
imum amount of (information theoretical) entropy that has to be dumped into the environment, 
as measured by the conditional max-entropy. The bound is tight up to a logarithmic term in the 
failure probability. Our result shows that the minimum amount of work required to carry out a 
given process depends on how much correlation we wish to retain between the input and the output 
systems, and that this dependence disappears only if we average the cost over many independent 
copies of the input state. Our proof is valid in a general framework that specifies the set of possible 
physical operations compatible with the second law of thermodynamics. We employ the technical 
toolbox of matrix majorization, which we extend and generalize to a new kind of majorization, 
called lambda-majorization. This allows us to formulate the problem as a semidefinite program and 
provide an optimal solution. 



Introduction. — Landauer's Principle [1, 2], and more 
generally the relation between the second law of ther- 
modynamics and information theory, has received much 
attention in the past decades. Studies have notably fo- 
cused on fundamental limits on heat generated by com- 
putation [2], the exorcism of Maxwell's demon via infor- 
mation theory (see eg. [3]), and generalizations to quan- 
tum settings such as the characterization of entanglement 
through thermodynamical considerations [4], or the de- 
termination of the work cost of information erasure with 
the help of quantum side information [5] . 

Landauer's Principle can be stated in the following 
way. Consider the erasure process of an unknown bit, 
i.e. the logical operation that resets the bit to a reference 
state (e.g. zero). Landauer's Principle asserts that any 
physical implementation that performs this erasure, us- 
ing a heat bath at temperature T, has a work cost of at 
least kT ln(2), where k is the Boltzmann constant. More 
generally, Landauer noted that all irreversible operations, 
and not only the erasure of a bit, must cost work due to 
the transfer of entropy from the information-bearing de- 
grees of freedom to the environment, which causes the 
system to dissipate heat. Bennett refined the formula- 
tion of this principle and showed its relevance in thermo- 
dynamics (exorcising the Maxwell demon [2, 3]) and in 
computation [6]. 

The work cost of thermodynamic processes in the con- 
text of information theory has been studied for various 
classical and quantum systems. Szilard [7] originally con- 
sidered a single-particle gas enclosed in a box with a pis- 
ton and noted that fcTln(2) work could be reversibly ex- 
tracted from the gas at the expense of losing the infor- 
mation about which side of the piston the particle is on. 
The reverse process corresponds to erasing this informa- 
tion, bringing the particle on one definite side at fcTln(2) 
work cost. Landauer [1, 8] studied the example of a par- 
ticle in a double-V shaped potential, which represents 
a bit of information, and showed that its erasure costs 



work. While these results apply to fully unknown bits, 
the bounds have to be adapted if the system we erase is 
partially known. In such a case, the average amount of 
work needed is lower bounded by fcTln(2) H(X), where 
H(X) is the Shannon entropy of the system X and where 
the average is taken over many independent repetitions 
of the erasure process [3, 9, 10]. This result has been 
derived and extended in several contexts such as using 
quantum computers performing data compression [11], 
Hamiltonian models [12] or in a resource theory frame- 
work [13, 14] . This bound can also be generalized to other 
processes, for which the average work cost is then given 
by the amout of entropy the processes transfers into the 
environment. We refer to Janzing [15, 16] for a proof in 
a resource theory framework. 

Generalizations to a single-shot regime, where state- 
ments are made about individual processes rather than 
many repetitions of them, have been proposed, for ex- 
ample in terms of majorization conditions [13], and in 
terms of entropic quantities which take into account ap- 
proximate transitions and a probability of failure [17- 
19]. Explicit Hamiltonian models have also been used 
to study the case of erasure with quantum side informa- 
tion [5]. It is usually assumed that the system carry- 
ing the information has a degenerate Hamiltonian. More 
recently, these thermodynamic considerations have been 
extended to the case of non-degenerate Hamiltonians [19- 
21], and the majorization condition also adapted to this 
scenario [17, 19, 21], based on ideas from [22-25]. 

In the present article, we revisit Landauer's principle in 
the light of general quantum processes. Our main result 
is an explicit and rigourous expression for the fundamen- 
tal minimal work cost of any process £ that acts on a 
system X and brings it from a state cr to a new state p. 
The bound is robust, i.e. it holds even if one tolerates 
an error probability e. The work cost W of such a pro- 
cess is lower bounded by the amount of entropy that has 
to be dumped into the environment, as measured by the 
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smooth conditional max-entropy [26, 27], 

W>kT\n{2)H^{E\X) p . (1) 

Here, the entropy is evaluated for the state p which is 
a purification of the output state obtained by applying 
the process £ to a purification of the input state ax (see 
Proposition 3). The entropy measure, ffj, ax , is part of 
the smooth entropy framework that is widely used in 
single-shot quantum information theory [26-30]. Its for- 
mal definition will be given later. 

Our quantitative Landaucr's Principle is tight up to 
logarithmic terms in the failure probability of the imple- 
mentation of the process £. Indeed, we can devise an ex- 
plicit process carrying out the requested mapping £ that 
is nearly optimal. This near-optimal process is based on 
the scheme proposed by del Rio et al. [5] , which erases a 
system using available quantum side information. 

Our bound is valid in a general framework that speci- 
fies the set of physically allowed operations. This frame- 
work conceptually separates the operations that are in- 
trinsically thermodynamical (e.g., the erasure of infor- 
mation) from those that simply correspond to reversible 
information processing (e.g., unitaries and the addition 
of ancillas). The former will be those that cost work or 
that are capable of extracting work from a system; the 
latter are done for free, i.e. at no work cost. We assume 
that our systems have a completely degenerate Hamil- 
tonian. The set of allowed operations is motivated by 
the second law of thermodynamics, which forbids cyclic 
processes whose net effect is to extract work. 

For the proofs we use a characterization of our frame- 
work by a relatively simple and intuitive generalization 
of the notion of majorization which is inspired by previ- 
ous work where the eigenvalues of the input are rescaled 
until the input majorizes the output [17, 19], achieved 
for example by appending a work system [21]. We term 
our generalisation lambda-majorization, and provide a 
mathematical characterization of this notion in terms of 
completely positive maps that satisfy some normalization 
conditions. 

In the asymptotic limit of many identical and identi- 
cally distributed (i.i.d.) copies of these systems (i.e., the 
process is repeated n —> oo independent times, £® n , on 
n i.i.d. input states a® n ), we obtain as a corollary of our 
main result a value for the average work cost of erasure 
per copy, 

(W) > [H(X) a -H(X) p ] fcTln(2) , (2) 

which is in agreement with the informal formulation of 
Landauer's principle, that the work cost of any process is 
determined by the decrease of entropy in the information- 
bearing degrees of freedom (see [16] for a proof in a re- 
source framework). 

We should point out that the general bound (1) can be 
arbitrarily larger than the average bound (2) . This devi- 
ation highlights an important feature, namely that corre- 
lations between the input and the output of the transfor- 



mation play a significant role in the single-shot regime. 
It is important to not only consider the input and output 
states, but also the whole process, or computation, that 
is performed on the actual input. This is natural and 
generalizes the classical case where this consideration is 
obvious, since a classical computer acts on the actual 
state of a register and not on its probability distribution. 
In the quantum case, we specify the full algorithm (or 
computation) as a completely positive map, which inher- 
ently tells us which correlations are preserved between 
the input and output systems. While the transformation 
of a state into another (e.g. in a resource theoretic ap- 
proach) is a relevant question, we focus in this paper on 
the case where the computation is given, thus fixing all 
the correlations that are preserved or destroyed between 
the input and the output. 

As a simple example, consider X to be a fully mixed 
qubit, i.e. in the state ax = \^-2- Suppose we wish to 
transform this state into another fully mixed qubit again, 
Px = There are two obvious processes that achieve 
this goal: we may (a) simply copy the input qubit to the 
output, or (b) throw away the input and prepare a new 
fully mixed qubit. Both processes (a) and (b) provide the 
required output. However, if we had information about 
the specific state in which the qubit initially was (e.g. 
suppose we had kept a qubit C that was maximally en- 
tangled with the input), then in the case of process (a), C 
would remain entangled with the output; however in the 
case of (b), C would have lost all correlations with the 
output qubit. In this first example, both processes cost 
no work: (a) is the identity process, and in (b), the work 
dissipated to erase the qubit is retrieved again when we 
prepare a new mixed qubit. 

However, the work costs of these processes differ if we 
consider less trivial input and output states. Let X be 
a quantum system composed of n + 1 qubits, in a state 
ax where the first qubit is randomly zero or one with 
probability 1 /2, and the n remaining qubits are either 
all zero if the first qubit is zero, or all in a fully mixed 
state if the first qubit is one. This state has the distri- 
bution {1/2, 2~<- n+1 \ 2-<- n+1 \ . . . 2-(" +1 )} and is depicted 
in Figure 1. Suppose that we wish to bring this system 
into the state px — ax, i.e. the same state as the input 
state, using either process (a) or (b) again. Process (a) 
would simply copy the input to its output, and would not 
cost any work, since it is the identity channel. However, 
process (b) first has to erase the input state and then pre- 
pare the output state. If we are lucky, the n qubits are in 
state |0 . . . 0)(0 . . . 0| (if the first qubit is |0)) and we can 
just erase the first qubit using fc7Tn(2) work. However, 
if we want to erase the system with certainty, we have 
to consider the worst case in which we have to erase n 
fully mixed qubits (which occurs with the non-negligible 
probability 1 /2). So the erasure work cost may be as bad 
as (n + l)fcTln(2). In order to prepare this state again as 
the output of the process, we may think of tossing a coin 
to decide in which state |0 . . . 0)(0 . . . 0| or |1)(1|®2-™1 2 „ 
to prepare X in. If we are lucky, we have to prepare a 
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FIG. 1: The probability distribution of a state in which single- 
shot effects become important, even for large systems. A 
register of n + 1 qubits are in a state p such that if the first 
qubit is zero (with probability 1 /2), then all the rest are zero 
too; if the first qubit is one (with probability 1/2), then all 
the rest are in a fully mixed state. The spectrum has a large 
eigenvalue (V 2 )i but also has a large support size (2™ + 1); as 
a consequence, i^min(p) ~ 1 an d -^max(p) ~ n can differ by 
an arbitrary amount. 



mixed state on n qubits and extract nfcT ln(2) work in 
the process, but in the worst case, we have to prepare 
|0 . . . 0)(0 . . . 0| and can't extract more than just fcTln(2) 
(from the coin toss). Hence, in the worst case, process 
(b) costs a total of nkT \n(2) work, which can be arbi- 
trarily larger than the (zero) cost of process (a); in fact, 
the gap diverges as n —> 00. 

This example shows that in the general single-shot 
regime, the specification of only the input state ax and 
the output state px does not suffice, and correlations be- 
tween the input and the output contribute to determine 
the minimal work cost of the process (although these cor- 
relations are not relevant in the asymptotic i.i.d. regime). 
Our result (1) incorporates this property intrinsically and 
provides a bound that is valid for any given process. 

The remainder of this paper is organized as follows. 
We will first present the mathematical framework used 
to model thermodynamic processes. We then introduce 
lambda-majorization, which captures all possible opera- 
tions in our framework. Lambda-majorization is charac- 
terized in terms of completely positive maps that satisfy 
some specific normalization conditions, and we use this 
characterization to derive the main result by formulat- 
ing the problem as a semidefinite program. The latter 
is solved by providing optimal primal and dual feasible 
plans with the same value, which guarantees optimal- 
ity of the result. Finally, some special cases are derived 
which recover some previously known results. 

Framework. — Consider a quantum mechanical system 
X in an inital state described by the density operator 
a. Our task is to bring the system X to another state 
p, while attempting to maximize some kind of notion of 
"extracted" work in the process. Throughout this pa- 
per we assume that the Hamiltonians of the systems we 
consider are completely degenerate. 

We first postulate two basic operations of thermody- 
namical nature, involving a heat bath at temperature T: 



the erasure of a single qubit to a pure state at fcTln(2) 
work cost, and the corresponding reverse process which 
extracts fcTln(2) work by transforming a pure state into 
a fully mixed state. Here k is the Boltzmann constant. 
These operations are motivated by the variety of explicit 
physical thermodynamical frameworks in which they can 
be performed, for example using Szilard boxes [7, 18] or 
by isothermally manipulating energy levels of Hamilto- 
nians [5, 12, 20]. Crucially, we assume the second law of 
thermodynamics, and require that there exist no opera- 
tion that would allow us to form a cycle for which the 
net effect would be the extraction of work. This justifies 
that no other work extraction procedure can yield more 
work than kT ln(2) from a pure qubit, or else a cycle with 
net work gain could be formed by appending an erasure 
process, itself only costing fcTln(2). 

Apart from this constraint on the set of allowed opera- 
tions, it is natural to also allow usual quantum informa- 
tion processing. Since our Hamiltonians are degenerate, 
we can allow all global unitaries and they cost no work. 
We do not need to use the fact that these unitaries are im- 
plementable by a device operating in contact with a heat 
bath, since expanding the class of allowable operations 
actually strengthens the bound we derive. In practice, 
one has very crude local control over the operations, and 
the acting agent does not know which unitary is being 
implemented, however, this is actually not an obstacle 
for implementation [11, 14]. In addition to unitaries, we 
will allow pure ancillas to be added to the system, which 
permits more general computation. Crucially, ancillas 
will have to be restored to their initial pure state, so that 
it is not possible to "hide" a work cost in an ancilla that 
was left mixed. 

The following framework is motivated by the above 
considerations. The processes we allow are (finite) com- 
binations of the following elementary operations: 

(a) Bring n qubits (of the system X or an ancilla A) 
from any state to a pure state ('erasure') at cost 
n kT In 2 work; 

(b) Bring n qubits (of the system X or an ancilla A) 
from a pure state to a fully mixed state while ex- 
tracting nfcT In 2 work; 

(c) Add and remove ancillas in a pure state at no work 
cost, as long as all the ancillas have been restored 
to their initial pure state at the end of the process; 

(d) Perform arbitrary unitaries (over X and any added 
ancillas) at no work cost. 

Operations (a) and (b) are those of thermodynamical 
nature, and may be carried out in a wide range of existing 
frameworks as mentioned above. One may view these 
operations as defining a quantity which we call "work" . 

On the other hand, operations (c) and (d) are purely 
information-theoretical. They allow us to perform any 
quantum information processing circuit, since we allow 
pure ancillas to be added. However, there is the condi- 
tion that "randomness" may not be disposed of for free, 



namely that ancillas have to be restored to their initial 
pure states at the end of the process. 

Lambda-Majorization. — We will now provide a simple 
mathematical characterization of all operations allowed 
in our framework. 

First, note that the operations (a)-(d) allow the use 
of so-called noisy operations [13], which correspond to 
adding an ancilla system N in a fully mixed state, per- 
forming a joint unitary, and removing the ancilla. Specif- 
ically, a noisy operation is composed in our framework of 
first an operation of type (c) (adding a pure ancilla of n 
qubits), followed by an operation of type (b) (extracting 
nkThi2 work from the ancilla making it fully mixed), 
then one of type (d) (performing the necessary unitary 
to carry out the noisy operation), and finally an opera- 
tion of type (a) (erasing the ancilla back to its pure state 
at a work cost nkT In 2). The total process has a work 
balance of zero. This means that we may thus carry out 
noisy operations for free within our framework and use 
them as building blocks for more complex processes. In 
the following, we deal implicitly with the ancilla N and 
it should not be confused with further ancillas that will 
be added. 

The following result by Horodecki et al. [13] relates 
noisy operations to the mathematical notion of majoriza- 
tion [31, 32]. 

Noisy Operations and Majorization. The transition 
on system X from state a to state p is possible by noisy 
operation if and only if a >~ p. 

Majorization between two (normalized) states a >- p 
captures the fact that p is "more mixed" than a, or that 
the eigenvalues of p can be written as a "mixture" of the 
eigenvalues of a. Formally, majorization can be char- 
acterized by the existence of a unital, trace-preserving 
completely positive map that brings a to p [33-36]. A 
channel £ is trace-preserving if £ t (1) = 1 and unital if 
£(1) = 1. 

Proposition 1. Two positive matrices a and p satisfy 
a y p if and only if there exists a trace-preserving, unital, 
completely positive map £ satisfying £ (a) = p. 

The notion of majorization is discussed in more detail 
in Appendix A. 

We will now provide some background insight for the 
meaning of our new concept of lambda-majorization. 
The idea is to characterize "how well" a state a majorizes 
a state p. Suppose that we have a system X in state ax 
and we want to bring it to the state px , where ax >~ px ■ 
In this case, one can simply carry out a noisy operation 
as described above. Suppose now that we have an ancilla 
A that is in a fully mixed state, py , and suppose that we 

are fortunate enough for ax <8> py >~ px ® |0)(0|a to also 
hold (for some pure state \0)a on A). Then by apply- 
ing a joint noisy operation on both systems, this would 
correspond to actually erasing the system A "for free" 
during the transition a — > p. We could then say that 
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FIG. 2: Lambda-Majorization corresponds to absorbing a cer- 
tain amount of randomness from an ancilla during a unitary 
operation. The system X starts in state a, and the ancilla 
A in a state with Ai fully mixed qubits with the remaining 
qubits pure. The goal is to devise a global unitary that will 
bring the system X to the state p, while leaving the least 
possible number A2 of fully mixed qubits in A. The difference 
A = Ai — A2, is the work extracted by the process; if the value 
is negative, it corresponds to a work cost. In the main text, 
we allow a noisy operation instead of a unitary operation, 
but one could simply add more mixed qubits to the ancilla on 
each side and use those to implement a noisy operation with 
a unitary. 



the randomness of the ancilla A was "transferred" into 
system X . We will view this type of transition as work 
extraction on system X during a transition ax — > Px ■ 

In another situation, it might be that ax )/■ px- How- 
ever, in that case, for a large enough ancilla A the ma- 
jorization a x ® |0)(0|a >- px <S> py wm h°lcL The cor- 
responding noisy operation then leaves us with a mixed 
ancilla that started off pure; we will view such a transi- 
tion on system X as costing work. 

Such operations can be performed within our frame- 
work, using operations (a)-(d). In particular, the rela- 
tion to work is given by elementary erasure and work 
extraction (operations (a) and (b)) applied to the ancilla 
A after the transition to restore it to its initial state. 

In general, the ancilla A may start with Ai mixed 
qubits and end up with A2 mixed qubits after a noisy 
operation; we consider in this case to have extracted 
(Ai — A2) fcTln(2) amount of work. This situation is de- 
picted in Figure 2. Both considerations above about 
work cost and work extraction are encompassed, sim- 
ply because we count the difference in the "amount of 
randomness" present in the ancilla before and after the 
process. This is the idea behind the concept of lambda- 
majorization, whose definition we can now state. 

Lambda-Majorization. For two density operators ax, 
Py on two systems X and Y , we will say that ax A- 

majorizes py, denoted by ax — > Py, if there exists a 
(large enough) ancilla system A, as well as Ai,A 2 ^ 
with A = Ai — A2, such that 



•ax>T 



Px 



where 2~ Xl \ 2 x i an d 2~ A2 l2*2 o,re fully mixed states on 
Ai (respectively \%) qubits of A, and where the remaining 
qubits of A in each case are pure. 
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An expression for "by how much" a state majorizes an- 
other was originally introduced in [17] and used in [19], in 
the context of work extraction games from Szilard boxes. 
Their measure, the "relative mixedness" between a and 

p, corresponds to the optimal A such that a — > p. 

Lambda-majorization captures the possible processes 

that are allowed in our framework. Indeed, if a A- p, 
then one has 2 _Ai 1 2 a 1 <g) a y 2~ A2 1 2 a 2 S3 P for some 
Ai, A2 with A = Ai — A2. Hence, there exists a noisy op- 
eration (itself a combination of operations (a)-(d) with 
zero total work cost) that performs the transition from 
2~ Ai 1 2 a 1 (gier to 2~ A2 1 2 a 2 ®p. The Ai mixed qubits that 
we have appended to a can be created by appending a 
large pure ancilla (operation (c)), and using operation 
(b) to extract Ai fc7Tn(2) work from Ai qubits, render- 
ing them fully mixed. At the end of the process, after 
the noisy operation, we need to restore the ancilla in a 
pure state; we thus need to erase (operation (a)) the re- 
maining A2 qubits, costing A2 kT ln(2) work. The total 
extracted work is then (Ai - A 2 )fcTln(2) = AfcTln(2). 
Conversely, each individual operation (a)-(d), individu- 
ally transforming some state a into a state p and costing 

work W, implies the lambda-majorization a A> p with 
W = — AfcTln(2). This is clear for operations (c) and 
(d). For operations (a) and (b), this follows from results 
derived in Appendix A 3. 

The ancilla system above may be viewed as some kind 
of "information battery" , as was suggested by Bennett [2] 
who suggested using a blank memory tape as "fuel" to 
extract work. In this case, the ancilla can be used as a 
storage of "purity" (or as a storage for "mixedness" or 
"randomness" which we would like to get rid of), which 
is increased or decreased by processes like the ones sug- 
gested above. 

It turns out that one can characterize lambda- 
majorization by the existence of a completely positive 
map satisfying some special normalization conditions, 
analogously to Proposition 1. 

Proposition 2. Two normalized density matrices ax 

and py on two systems X and Y satisfy ax Py if 
and only if there exists a completely positive map Tx^y 
satisfying py = Tx^Y (°x). such that T x ^y i^-x) ^ ly 
andTx^Y (lx) < 2~ A l y . 

A channel 73c ->y that satisfies the two last conditions 
will be referred to as a lambda-majorization channel. 

Furthermore, although the channel T is not directly 
a physical channel (it can be, for example, trace- 
decreasing), it can always be viewed as part of a uni- 
tal channel £, in the sense that T can be obtained by 
projection onto specific subspaces and tracing out the 
ancilla A of the channel £ (see Appendix A 2). In turn, 
unital channels are a (strict [37]) superset of the noisy 
operations. Recall that our task is to find a lower bound 
on the work cost of all possible processes allowed in our 
framework, which we will do by optimizing the work cost 
over all processes that perform a given state transition. 




FIG. 3: Our main result gives a fundamental lower bound on 
the work cost IF of a process transforming a state ax (puri- 
fied by a ficiticious \a)xn) into a new state pxn obtained by 
applying a process Ex^x- The lower bound to the work cost 
is given by the entropy that the process £ has to dump into 
the environment E (in which pxn is purified), as measured 
by the Renyi-zero conditional entropy Ho (E\X) p . 



However, instead of considering only the unital channels 
£ that are noisy operations, we will relax this last condi- 
tion and consider all unital channels £, and thus allow the 
optimization to range over all T that satisfy the condi- 
tions of the above proposition. This will make our lower 
bound even stronger, by showing that the lower bound 
still holds even if we relax somewhat the assumptions in 
our framework. 

Main Result. — We are now ready to derive our main 
result. Consider a system X in the state ax ■ This system 
can always be purified by a reference system, i?, in a pure 
joint state \a)xR- 

Allowing actions defined by our framework on A, we 
will study the transition of this state to a state Pxr-, by 
applying a process Tx^x- The systems are depicted in 
Figure 3. 

The task we would like to solve is the following. Given 
ax and a process £x^x, and given a purification \a)xR 
of ax and an output state pxr — £ (cxr), we would like 
to find the least amount of work W one has to pay for 
any process in our framework that implements the action 
of £ on a. As we have seen in the previous section, we 
can formulate within our framework all possible processes 
as lambda-majorizations, so our task is actually to find 

the best A such that ax — > px, with the corresponding 
lambda-majorization channel T from Prop. 2 satisfying 
T {axR.) — Pxr- 

Our main result gives an upper bound on the optimal 
amount of work that can be extracted by this transition, 
or equivalcntly, a lower bound on the minimum amount 
of work that will have to be paid in order to perform the 
transition. The main result follows directly from follow- 
ing technical proposition. 

We are given an input state ax and a process £x^x- 
Let \a)xR. be a purification of ax-, and let pxr, — 
£x^x {cxr)- Let also Pxre be a purification of Pxr 
in an environment system E. The Renyi-zero entropy 
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H (E\X) p [26, 38] is defined by 

H (E\X) = max tr[n XE ^x} , (3) 

tr ojx— 1 

where Hxe is the projector on the support of pxe- 

Proposition 3. Then the X-majorization ax — > Px 
holds, with the channel Tx-^x from Prop 2 satisfying 
T{<jxr) = Pxr, if and only if X ^ -H (E\X) p . 

Main Result. Any process in our framework acting on 
system X that implements the channel £ when given in- 
put ax ( or equivalently, that brings the state axR to the 
state Pxr) has to cost at least fcTln(2) • Hq (E\X) work. 

In other words, the minimal work cost of a transition 
from a to p is given by the amount of (information- 
theoretic) entropy dumped into the environment, condi- 
tioned on the output of the computation. This is pre- 
cisely the quantitative generalization to correlated quan- 
tum systems of the original Landauer's principle [1]. 

It is worth noting that instead of specifying the chan- 
nel £ , we may also simply specify the output state pxr, 
which completely determines the process (on the support 
of ax) since it is the Choi-Jamiolkowski state correspond- 
ing to £ rescaled by ax {pxr. — £ (o~xr))- One can thus 
understand the input to the problem to actually be a bi- 
partite state pxr, such that px is the required output, 
Pr is the input that will be fed into the process, and any 
correlations between X and R specify parts of the out- 
put that we wish be preserved and not be modified, or 
thermalized, by the process. 

The full proof of Prop. 3 is provided in the appendix. 
We provide the general idea of the proof in the following. 

Proof Sketch of the Main Result. The main idea of the 
proof is to write the optimization problem as a semidefi- 
nite program for the variables a = 2~ A , TxX' (the Choi- 
Jamiolkowski representation of 73f->x')> an d the dual 
variables uj x >, Xx and Zx'R- Let (•) x denote the partial 
transpose operation on X. The optimal extracted work 
A is given by the following semidefinitc program: 

Primal 

minimize : a 

subject to: 

tr x [Txx'] «S ott x > ■ wx' 
tr** [T X x>] «S lx : X x 

tr X [TxX'Cxr] = PX'R ■ Z X 'R 

Dual 

maximize : tr [Z X t R. PX'r) ~ tr Xx 
subject to: 

tru>x' *S 1 

tr* [a tx R Z x >r] Clx® w x > +X X ® l x , 

The optimal value a = 2 H °( E > X ' p is achieved (see 



Appendix B) by the completely positive map Tx-yX' — 
tr B [Vx -tX'E (•) , where Vx-tX'E is the partial isom- 
etry with minimal support relating axR to p x < er (both 
being purifications of the same an = pa). 

While it is clear from the formulation of our problem 
that T is already completely determined on the support 
of a x (expressed by the condition T [a X R) = Pxr), the 
optimization over T is done in order to (at least formally) 
find the optimal action on the complement of the support 
of ax- Also, the formulation of a lambda-majorization 
problem as a semidefinite program is a more general tool- 
box that could be used in the case where the mapping 
is not completely determined and where arbitrary addi- 
tional semidefinitc conditions can be imposed at will. 

Allowing a Probability of Error. A "smooth" version 
of the result is straightforward to obtain. In this case, we 
allow the actual process to not exactly implement £, but 
only approximate it well. The best strategy to detect this 
failure is to prepare \a)xR and send ax into the process, 
and then perform a measurement on pxr- To ensure the 
probability of error does not exceed e, the trace distance 
between the ideal output of the process pxr and the 
actual output pxr must not exceed e. We can apply our 
main result to the approximate process that brings a to 
p, and lower bound the work cost of that process by 

W(a^p) ^ H (E\X) p -kT\n(2) 

>H m ^{E\X) p -kT\n{2) , (6) 

where the second inequality is shown in [39] and involves 
the max entropy measure i? max as defined in [27, 28]. For 
any e ^ 0, the smooth max entropy is defined as 

H s max (E \X) = min max logF 2 [p EX , 1 E ® r x ) , (7) 

-JL T x >0 
P~P trrx=l 

where the first optimization ranges over all pex such that 
F 2 (p,p) ^ 1 - e 2 and where F (p,p) = Hy^VPlli is the 
fidelity between the quantum states p and p [40]. We 
write -ff m ax to indicate i?„ lax with e = 0. 

If we optimize (6) over all possible channels T that 
output such pxr, we obtain a bound on the extractable 



For example, instead of fixing the process with T (ctxr) = PXR, 
one may have instead required that T(cr x ) = Px f° r given ax 
and px, not specifying and optimizing over what happens to 
correlations between the input and the output (or, equivalently, 
one could optimize over Pxr with fixed reductions px and Pr). 
In that case, the semidefinite program can be used to obtain 
bounds to the optimal value. This also implies that the "relative 
mixedness" introduced in [19] can be formulated as a semidefinite 
program. 
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work with a probability of error e, 

W ^ min H m ^(E\X) p - kT \n(2) 

P~XR~PXR 

> mm H m ^(E\X) p ■ kT ln(2) 

PX RE~PX RE 

= H e max (E\X) p -kT\n(2) , (8) 

where the first optimization ranges over all pxr. such 
that the trace distance \\\pxr — Pxr\\i ^ £, and where 
the second optimization ranges over all pxre such that 
F 2 (pxre,Pxre) > 1 ~ £ 2 , with e = \/2e. 

Tightness of the Bound. — The bound given in the main 
result is tight up to error terms of the order of log | . 
Indeed, let's consider the following simple process: one 
appends a large enough ancilla in a pure state to 
the input, so that we have our systems in the state 
o~xra e — \®)ae ® \°~)xr- Let us consider a purification 
\p)xra e °f Pxr- Since the reduced state on R of both 
these states are the same, an = pu, there exists a unitary 
U acting on X ® A E such that \p)xra e = U\o-)xra b - 
So we can apply this unitary onto our input at no work 
cost, and we are left with \p)xra e on our systems. We 
then apply the protocol proposed by del Rio et al. [5] on 
the system Ae, using the system X as a memory we have 
access to, in order to erase the ancilla A E back to a pure 
state. Recall that their process acheives this task with- 
out modifying the reduced state Pxr, and at a work cost 
/cTln(2)iJf nax (A E \R) + O (log i). It is also straightfor- 
ward to note that their protocol can be carried out within 
our framework. Thus, up to error terms of the order of 
the logarithm of the error probability, our bound given 
by (8) is tight. 

Special Cases. — From our main result we can recover 
several some special cases of specific interest as corollar- 
ies. 

Von Neumann Limit. As we have seen in the intro- 
duction, considerable previous work has focused on the 
limit cases where many i.i.d. systems are provided. In 
such a case, the process £ ® n is applied on n indepen- 
dent copies of the input cr®", and outputs p®". Say we 
tolerate a probability of error e. We may simply apply 
our (smoothed) main result to get an expression for our 
bound on the work cost, 

W>H^(E n \X n ) p ® n .kTln(2) , (9) 

however it is known that the smooth entropies converge 
to the von Neumann entropy in the i.i.d. limit [41], 

lim lim -i^ ax (E»|X«) .„ = H (E\X) , (10) 

s— >0 n— >oo n r ' 

which allows us to simplify the expression to 
H (E\X) p = H (EX) p - H (X) p = H (X) a - H (X) p , 
where the last equality holds because p E x and ax have 



the same spectrum being both purifications of the same 
pn — o~r. We conclude that in the asymptotic i.i.d. case, 
the work cost of such a process is simply given by the 
difference of entropy between the initial and final state, 

W^[H (initial state) - H (final state)] fcTln(2) . (11) 

We emphasize that in this case the exact process is not 
relevant, and only the input and output states matter. 
If one considers the example given in the introduction 
with (a) the identity channel and (b) a replacement map, 
and apply these processes on n independent copies of the 
distribution described in Figure 1, then in this regime 
both processes cost no work. 

Erasure of a Quantum System Using a Quantum Mem- 
ory. Consider the setting proposed in [5] , where a system 
S is correlated to a system M in a joint state o~SMi and 
where our task is to erase S while preserving the reduced 
state on M and any possible correlations of M with other 
systems. Formally, given a purification o~smr of o~sm i we 
are looking for a process that will bring this state to the 
state psmr — |0)(0|s ® <Jmr, i-e. we require the pro- 
cess to preserve o~mr- In [5] a process is proposed that 
performs this task at work cost 

fcTln(2)^ ax (5|M) ff +0(logi) , 

where -ff^ax i s the smooth max entropy [27-29]. 

This is a special case of the general case considered 
above, simply by considering X to be the joint system 
of S and the memory M, J&x — ® ^m- Note that 
we have psmr = |0)(0| s (8 a M R, purified by \p)smre = 
|0)s ® \p)mre, where \p)mre = U s ^e\°)smr and 
Us^e is an isometry from S to E. 

Then the bound on the work cost, tolerating a proba- 
bility of error of at most e, is 

W>H^{E\SM) p -kT\n{2) 
= H^{E\M) p -kT\n{2) 
= H^(S\M) a -kTln(2) , (12) 

where the first equality follows because p is pure on S 
and the second by reversing the isometry U. We can 
immediately conclude that, within our framework, any 
process that performs this erasure has to cost at least 
fcTln(2) iJ 1 'J lax (S'|A/) (T work. Thus, the process proposed 
by del Rio et al. is optimal up to logarithmic factors in the 
error probability e. Note that if we take the memory M 
to be trivial i.e. a pure state, then we are in the standard 
scenario of Landauer erasure on a single system, and we 
have W > H^^S) which is achievable, recovering the 
result of [18]. 

State Transformation while Decoupling from the Ref- 
erence System. Another special case that we can de- 
rive as a corollary is if we consider the process that 
erases its input and prepares the required output inde- 
pendently. This would occur if we required the output 
state to be completely uncorrelated to the reference sys- 



tem R. Being a replacement map, this process implies 
that pxr = Px ® Pr- In this case, any third party R 
that would have been correlated to the input is now com- 
pletely uncorrelated to the output. 

Again, we may simply apply our main result with the 
additional condition that pxr — Px ® Pr- In this case, 
the purification of pxr, Pxre, takes a special form due 
to the tensor product structure, with the E system split 
into two E R and Ex systems (E = E R ® Ex), 

\p)xre = \^)xe x ® \4>)re r , (13) 

where \ip)xE x an d \4>)re r are purifications of px and 
Pr, respectively. 

The lower bound on the work cost W , given by our 
main result and tolerating a probability of error of at 
most s, then reads 

W > #max ( E \ X )p = H max ( E r)\4,) + #max ( E x\ X )\^ , 

where e = y/2e and ff^ is again the smooth max en- 
tropy. Now, the spectrum of pe r is exactly the same as 
the spectrum of p R by the Schmidt decomposition of \<f>). 
This in turn has the same spectrum as ax also by the 
Schmidt decomposition of pxr and because pr = <jr. 
It follows that H^ nax {E R )p = if^ ax (X) CT . Also, by du- 
ality of smooth min and max entropies [27], we have 
H^(E X \X) W = -H^ n (E x ) p = -H^ in (X) p , where 
-ffmin is the smooth min entropy with purified distance 
smoothing as defined in Ref. [28]. In consequence, 

W > H s m3X (X) a - H^ n (X) p . (14) 

That is, to transform a state a to p while maximally 
decoupling p from the reference system, then one has to 
erase a to a pure state (at cost ff^ ax and then 

prepare p (extracting work H^ lin (X) ). 

Example: Erasing Part of the W State. — To illustrate 
some points mentioned above, consider the W state on a 
system S, a memory M and a reference system R given 
by 

\w) SM R = [jooi) + |oio) + \m)] SMR . (is) 

The reduced states on SM and M are respectively given 
by o-sm = ||00)<00| + ||*+)(^ + | and a M = ||0)<0| + 
where |*+) = X (|01) + 1 10) ). By symmetry of 
the W state, the reduced state on any two or one qubit(s) 
have the same form. 

By actions on S and M, we would like to erase S, 
leading to the final state on S and M given by psm — 
|0)(0| ® cm- Let us consider two processes that achieve 
this goal: the first one will preserve correlations with R 
but will cost work, the second will not cost work but will 
modify those correlations. 

We may directly apply the special case above concern- 
ing the erasure of a system conditioned on a memory: 



the fundamental work cost of such an erasure, if one pre- 
serves correlations with a reference system R, is given 
by H (S\M) a . One may explicitely calculate (see Ap- 
pendix C) in this case Ho (S\M) = log | ps 0.59 and thus 
this process must cost at least this amount of work. 

However, one may easily notice that both o~sm and 
<7m have the same spectrum { 2 /3, V 3 }- This means 
that there exists a unitary U that performs the era- 
sure simply as |0)(0| ® cfm = Uo~smU\ and this uni- 
tary process does not cost any work. However, the cor- 
relations with R are not preserved. Indeed, the uni- 
tary sends |00) to |01) and |*+) to 1 00) , so one ex- 
plicitely calculates that the state after the process is 
given by p SM R = Ua SMR W = ^ [|011) + V2|000)] = 

|0) ® i [| 11) +V2|00)]. We notice that the reduced 
state on M and R is now pure and differs from initial 
one, given by a MR = ||00)(00| + | ]*+)(*+ 1. 

Conclusion. — The last few years have seen enormous 
technological progress in micro- and nano-fabrication, 
making it possible to construct engines and thermo- 
devices on a microscopic scale [42-50]. In this regime, 
standard thermodynamic considerations (devised origi- 
nally for macroscopic devices such as steam engines) are 
not necessarily applicable. At the same time, with the 
miniaturization of computing circuits, thermodynamic 
aspects of information processing have become increas- 
ingly relevant. In fact, the heat dissipated by proces- 
sors is one of the main barriers limiting their perfor- 
mance. Along with these developments, researchers have 
started to investigate the laws of thermodynamics from 
an information-theoretic perspective [51-57]. 

The present work adds to this line of research, provid- 
ing a rigorous quantitative relationship between informa- 
tion theory and thermodynamics. One of our main find- 
ings is that this relationship is more intricate than what 
previous results (which focused on averaged quantities) 
may have suggested. In particular, it turns out that the 
thermodynamic cost of a given information-processing 
task not only depends on the input and output state, 
but also on the correlation between them. While this 
correlation-dependence disappears in certain asymptotic 
limits, it cannot be neglected in general and, in fact, may 
become arbitrarily large. 
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APPENDIX 



Appendix A: Formal Approach to 
Lambda-Majorization 

1. Preliminaries and Main Definition 

Let M'x , -y^Y be two subspaces of a finite-dimensional 
Hilbert space J%zi and let J#a, be two subspaces of 
a finite-dimensional Hilbert space J%c- Let dr.) denote 
the dimensions of the various Hilbert spaces and 
specifically let d = dz = dim Jt?z- Denote by ^f(Jf) the 
set of linear hermitian operators on , by ^(Jf? ) the set 
of positive semidefinite operators on Jtf, and by S^ = {^C) 
those operators in ^{j!f ) that have unit trace. Let also 
Xi(p) denote the i-th eigenvalue of p (in no particular 
order), and X^(p) denote the i-th eigenvalue of p taken 
in decreasing order. 

Majorization is discussed in detail in Refs. [31, 32, 58]. 

Majorization. A matrix a € 3 g (Jf?z) is said to ma- 
jorize p £ ^l-J^z), denoted by a >~ p, if for all k, 
Ei=i A t ( CT ) > Ei=i X i (p) > and if to a = tip. 

The notion of majorization defines a (partial) order 
relation on ^(J^z)- When considering the set of density 
matrices ^={3%), there is a "least" element: the fully 
mixed state, \lz- 

Weak Submajorization. A matrix a £ £P(Jtfz) is said 
to weakly submajorize p £ SP^fflz), denoted by a >- w p, 

if for allk, ELi^(*)>ELi^(p)- 

Remark that if a, p € ^=(^z)> then the concept of 
weak submajorization is equivalent to regular majoriza- 
tion simply because the traces of these matrices are al- 
ready equal to unity. 

Doubly Stochastic Matrix. A dxd matrix S is doubly 
stochastic ifS z j ^ 0, J2i S, j = 1 V j and £\ ' = 1 Vi. 

Doubly Substochastic Matrix. A n x m matrix B 
is doubly substochastic if > 0, Ei-^i J ^ 1 and 

E,./>', ' • i • 

The following theorem is due to Hardy, Littlewood and 
Polya [59]. 

Theorem 4 (Hardy, Littlewood, and Polya, 1929). Let 
er, p £ £P{ffl'z'). Then a >~ p if and only if there exists 
a d x d doubly stochastic matrix S t such that \i{p) — 
E^A-(a). 

A similar theorem is obtained for weak submajoriza- 
tion and doubly substochastic matrices [31]. 

Proposition 5. Let a £ ^{,^x) and p £ @>{3%y)- 
Then a y w p if and only if there exists a dy x dx doubly 
substochastic matrix B^ such that \%{p) = Ej B,^ \j(a). 



Majorization defines a partial order on states and has 
a "smallest" element, the fully mixed state. Also, a pure 
state majorizes any other state. 

Proposition 6. Majorization is preserved by direct sums 
and tensor products, i.e. if a >- p and a' y p' , then 
a © a' >- p © p' and a <8> a' >- p® p 1 . The same holds for 
weak submajorization. 

A proof for the direct sum of two vectors can be found 
in [31, Cor. II. 1.4]. We provide here an alternative proof 
along with the tensor product case. 

Proof. Let 5/ and S'^ be doubly stochastic matrices such 
that Xi(p) = EjS/A^a) and A,(p') = V ; .S/A^';. 
Then S(BS' is also doubly stochastic and satisfies Xi(p(B 

P 1 ) = J2j( s © s ')i X j( a © because the vectors of 
eigenvalues of the direct sum are simply the direct sums 
of the individual vector of eigenvalues. This shows that 
a © a' y p® p'. 

Analogously, S ® S' satisfies Xw(p ® p') = 

xMxAp') = ^fSPx^sj'Xj^o-') = EjA s ® 

S ')ii> X ij' ( a ® °~')- S ® S' is doubly stochastic, Y,w ( s ® 
S'W = Eu>S?Sj = 1 and £..,(£ ® S')g = 

The same proof holds for doubly substochastic matri- 
ces, so majorization may be replaced by weak subma- 
jorization in the proposition. □ 

We are now all set for a formal definition of lambda- 
majorization. 

Let Ael and let Ai, A2 ^ such that A = Ai — A2 and 
2 Al , 2 A2 are integers. (The case when 2 A is irrational will 
be discussed later.) Take 3%c of size greater than both 
2 Al and 2 A2 and let M'a and j^b be subspaces of M'c 01 
respective dimensions 2 Al and 2 A2 . 

Lambda-Majorization. For a £ &{J4?x) an d p £ 

3^{J^y), we say that a A-majorizes p, denoted by a A- p, 
if there exists such X\, A2 such that 2 _Al l J 4 eg) a y w 
2~ X2 1b <E> p. Here 1a, 1_b are the projectors onto the 
respective subspaces .J%a and J4?b embedded in Mc, of 
respective dimensions 2 Al , 2 A2 . Likewise, a and p are 
considered as living in fflz by padding them with zero 
eigenvalues as necessary. 

We have assumed here that 2 A is rational. If 2 A is 
irrational, we say that a A-majorizes p if for all rational 

2 A ' with A' < A, then a — > p. 

The following proposition guarantees that the defini- 
tion above does not depend on the exact values of Ai 
and A2 but only on their difference. This is the same as 
saying that a fully mixed state cannot act as a catalyst. 

Proposition 7. For any a, p £ £P (J#z), and for any n, 

we have a >- w p if and only if a ® >~ w p ® . 
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Proof. If er p, then the majorization passes over the 
tensor product, and thus proves the claim. Conversely, if 
a ® -f p <X> then in particular, for any fc ^ d, 

n-k n-k 

E A i(i?^)^E A i(if^)- ( A1 ) 

(d is the maximum rank of cr or p.) But Xj n (^- <E) o~) — 
■^Xj(a) and thus 

k k 
i=l f=l 

The following proposition is a direct consequence of the 
definition of lambda-majorization, and just states that 
you can move around randomness into or out of the an- 
cillas in the definition of lambda-majorization. 



Proposition 8. For any a e 3 s {Mx), P € 3>{My), and 
for any X € R, n > 0, we have 

i , „ A A+logn 

^l„(g)(7->p ^ a >p 

and 

A— logn A _ 



Now we have 

b 

abk 

k \ ab / 

so one can define 

rj-i k ^ ^ 2 _ -^1 Q ^ 

a b 

which fulfills Xi(p) = Ylk Ti k Afc (c) . Because S is doubly 
substochastic, and using the fact that indices a (resp. b) 
range to 2 Al (2 A2 ), the matrix T satisfies 

E^ fc = E 2 ~ Al s b? k = E 2 ~ Al E s u k < 1 > 

as well as 

E T ^ = E 2- Ai ^ 6 f fe = E 2 ~ Al E s « fc 

fe fca6 b ak 

< E 2 ~ Ai = 2 ~ A • 

Additionally, T t k ^ because S , bl afc ^ 0. 



Similarly to Thm. 4 and to Prop. 5, it is possible to 
characterize lambda-majorization by the existence of a 
matrix relating the vector of eigenvalues that satisfies 
some specific normalization conditions. 



Proposition 9. Let a 6 S^iJ^x) and p 6 
Then a A> p if and only if there exists a dy x dx matrix 
T t k such that Xi(p) = J2 k T i k Xk(<?), satisfying T t k > 0, 



Proof of Prop. 9. Suppose 2~ Xl l A ® a > w 2 _A2 1 B ® p 
with A = Ai — A2. Then there exists a doubly substochas- 
tic matrix S b f such that 



A&i(2 



Is ® p, 



= > 'S„? fc A afc (2- Al l A 



E ^» 

a k 



with 5 h f fe ^ 0, Ebi S b f < 1 and £ afc 5«? fc < 1. (Indices 
a and 6 refer to the mixed ancillas of respective sizes 2 Al 
and 2 A2 . Since we are considering weak submajorization, 
we can safely ignore all zero eigenvalues and consider only 
the subspaces (of different sizes on the left and right hand 
side of the majorization) on which o~, p, 1 A and 1 b have 
support, as in Prop. 5.) 



Conversely, suppose that a matrix T i k exists, with 
T t k > 0, < 1, Ek T i h < 2- A , and A,(p) = 

^ fc T i fe Afc(tr). Let Ai,A 2 such that A = Xi — A 2 and 
such that 2 Al , 2 Aa are integers. Then let S£ k = 2" A2 T l k 
for all a, b. Then S b ? k > and S satisfies 

E s b f = 2- A2 E T * fc < 2 ~ A2 (E x ) 2 ~ A = 1 ■ 

ak ak a 

as well as 

E s b f = 2-^ E T * fc < 2 ~ A2 (E x ) = 1 ■ 



hi 



hi 



The required weak submajorization for the desired 
lambda-majorization is provided by this doubly sub- 
stochastic matrix, 

X bl (2- A2 1 B ® p) = 2- A2 Aj (p) = 2- Aa E r i " A * 

= 2- Aa e T i fc E A « fc ( 2 ~ Al 1 -4 ® ct ) 

fc a 

= E^fAafc(2- Al lA®^) ■ □ 
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2. Formulation of Lambda-Majorization in Terms 
of Channels 

Majorization can also be characterized in terms of uni- 
tal, trace-preserving completely positive maps [33-36] . 

Proposition 10. Two positive semidefinite matrices a 
and p satisfy a >- p if and only if there exists a trace- 
preserving, unital, completly positive map £ satisfying 
E(a) = p : 

Similarly, one can prove an analogous characterization 
of weak submajorization. The proof of this proposition 
will be given later. 

Proposition 11. Let a £ 0>{J?x) and p £ &>(3#y)- 
Then a y w p if and only if there exists a completely 
positive map Ex^y ■ 5£{3%x) ~^ 5£{3%r) such that 
Ex^tY (f) = P> with £ satisfying Ex^y (lx) *S ly and 

£ x ^y (!y) < 1a-. 

Let's say that £x-s-y is subunital if Ex^y (lx) ^ ly- 
Then the two conditions on the structure of the channel 
Ex^y in the above proposition require the channel to be 
subunital and trace-nonincreasing. 

A subunital trace-nonincreasing completely positive 
map can always be seen as part of a unital, trace- 
preserving completely positive map on a larger Hilbert 
space. This is analogous of the result that doubly sub- 
stochastic matrices are submatrices of stochastic matri- 
ces [31]. 

Proposition 12. Let Ez~>z be a unital, trace-preserving 
completely positive map. Let Ji?x and Jtfy be two sub- 
spaces of Mz and let lx and ly be the projector onto 
those spaces, respectively. Then the channel E' X ^ Y (•) = 
lyE (Ijf • lx) ly is subunital and trace- decreasing. 

Conversely, let E'x^y be any trace- decreasing, subuni- 
tal completely positive map. Let Jtffz = J^x (BJ&y, Gy = 
ly - E' X ^ Y (l x ) > 0, and H x = l x - £' f (ly) > 0. 
Then the channel defined by 

Ez^z (•) 

= 0x®£x^y(^x{-)1x) 

+ [ly (•) ly) 8 Oy 

+ (ox e ^Gy") (•) (px e ^Gy") 

+ (^/Hx~ © Oy) (•) (v^x" © Oy) 

is unital and trace-preserving, and E'x^y (') = 
1 Y £ (l x (-)lx)ly- 

In order to generalize this concept to our lambda- 
majorization, let's introduce the concept of an a- 
subunital map. These generalize the notion of subunital 
maps to arbitrary normalizations. 

a-subunital Maps. We'll call a map 7x^y a-subunital 
if it satisfies Tx->y(1x) ^ ccly. 



Proposition 13 (Composition of a-subunital maps). 

Let My/ £ M'z be another subspace of fflz in addi- 
tion to Jfx and My, and let Tx^y, Ty-yW be trace- 
nonincreasing maps. Assume that 7x->y is a-subunital 
and that T Y ^ W * s fi-subunital. Then their composition 

[To 7] 

x-*w is a • f3 -subunital. 

Proof of Prop. 13. The composition of 73c ->y and T Y ^ W 
is trace-nonincreasing, 

T (T (%)) *s T (iy) < ix ■ 

Their composition is also a ■ f3 -subunital, 

T y ^ w (Tx^y{1x)) ^T Y ^ W (a a/3 l w . □ 

We will now give proofs for Props. 11 and 12, which 
rely on the following lemma. 

Lemma 14. Let Tz^z be a trace-nonincreasing map 
that is 2 _A -subunital. Denote by lx (resp. ly) the pro- 
jectors onto the subspaces J%x ( resp. JtCy ) of .^Cz ■ Then 
Tx^y, defined by Tx^y(-) = IyTz^z (lx (•) lx) ly, 
is also a trace-nonincreasing 2 _A -subunital map. 

Proof of Lemma 14- It suffices to note that the projec- 
tion map: (•) -> lx (•) lx (resp. (•) -> ly (•) ly) is 
trace-nonincreasing and subunital. Then apply Prop. 13 
twice. □ 

Proof of Prop. 12. The first part of the proposition fol- 
lows from the lemma. To prove the converse, let Ez^z as 
in the proposition text, and notice first that the channel 
is its own adjoint: 

ft (.) = £'t (l y (.) l y ) © y + Ox © £ ' (lx (•) lx) 

+ (o x © y/G^) (0 (Ox © 

+ (y/Hx © Oy) (•) (v^x © Oy) 

= £x^y(-) ■ (A2) 

The map is unital: 

Ez^z (lz) = Ox © (ly - Gy) + (l x ~ H x ) © Oy 
+ Ox © G + H x © Oy = \ z , 

and it is thus trace-preserving because of (A2). The last 
condition, £' X ^ Y (•) = lyEz^z (lx (•) lx) ly is obvi- 
ous from the definition of Ez^z- n 

Proof of Prop. 11. By the weak submajorization condi- 
tion, if trp ^ trcr, we must have trp < trc. Consider an 
extension space Jfy £ J$?z (consider a larger Jf?z if nec- 
essary) in which we extend p by many small eigenvalues 
such that tr /Cy©y = trcr, while still having a y w py^y. 
Now we have a (regular) majorization, a >- py^y, and 
can apply Prop. 10. 

The obtained map, £z-+Zi is then unital and trace- 
preserving. It can be restricted by projecting the input 
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onto fflx and the output onto My , 

£x^y{-) = ly £z^z (Ijc (0 lx) ly • 

This restricted operator, by the lemma, is a valid trace- 
nonincreasing subunital map (take A = 0). 

Conversely, if £ X ->y is a subunital trace-nonincreasing 
completely positive map with £x^y (c) = P, then 
one can dilate it with Proposition 12 to a uni- 
tal, trace-preserving completely positive map £z^z 
such that ly £z^z (f © 0y) ly = p. Note also 
that the map (■) M> ly (•) ly + l x (■) lx is a 
pinching [31, p. 50, Prob. II. 5. 5], so we have cr © 

0y >- £ Z ^ Z (cr©0y) >- l X £z^Z (<T 0y ) 1* + 
ly^z(^©0y)ly >-„, ljc^a (ff ffi 0y) lx = P- 
The last weak submajorization is because some eigen- 
values were left out. □ 

In the same way as lambda majorization can be charac- 
terized with differently normalized doubly substochastic 
maps, it can also be characterized in terms of a differently 
normalized subunital channel. 

Proposition 15. Let a e &>{Mx), P € &>(J? Y ) and 
A G R. Then a p if and only if there exists a 
completely positive map l~x ->y : Sf(J&x) 
such that Tx^y(o-) = p, that is 2~ A -subunital and trace- 
nonincreasing. 

Proof of Prop. 15. "=►". Assume first that 2~ Xl l A © 
a y w 2~ X2 1b © p, with Jf A , J#b (of respective sizes 2 Al 
and 2 A2 ) being subsystems of an ancilla system Mfa, with 
A = Ai — A2. 

By Prop. 11, there exists a subunital trace- 
nonincreasing completely positive map £ax^by, such 
that 



(A3) 



Now let the map T be defined by 

Tx^Y{-)=tx B [£ Ax ^ BY {2-^l A ®{-))} . (A4) 
This map is trace-nonincreasing, 



Vx^y (ly) = 2~ Al tr A 



t 

AX^BY 



(Iby) 



s? 2- Al tr A (l AX ) = lx 

and 2~ -subunital, 

Tx^y {lx) = 2- Xl ti B [£ (l AX )} < 2- Al ti B l BY 

= 2- x l Y 

The map T brings a to p, 

7x->y (fix) = tr B [£ (2- Xl l A © a x )\ 

= tT B (2- A2 l B ®py) =py 



so that T satisfies all the claimed properties. 

"^=". To prove the converse, assume that a trace- 
nonincreasing, 2 _A -subunital map Tx^y exists, such 
that 73c-j.y(cr) = p. 

Choose Ai, A2 such that A = Ai — A2 and such that 
2 Al , 2 A2 , are integers. (Again, in case 2 A is irrational, 
approximate 2 A arbitrarily well by rational numbers 2 A .) 
Choose fflc large enough to contain two subspaces M A 
and M'b of respective dimensions 2 Al and 2 A2 . Let 



£ 



AX^BY 



(•) = 2 



-A, 



Tx^Y (tV A (•)) 



(A5) 



This map is trace-nonincreasing, 
£ ] (l BY ) = 2- x >l A © T f (tr B l BY ) 



T f (2 A2 ly) < 1 



AX 



and subunital, 



£{l AX ) = 2- x n B ®T(tr A l AX ) 

= 2- x n B ®T(2 Xl l x ) s$ Iby 

since A = Ai — A2 and T is 2~ A -subunital. Also, 



£ (2~ Xl I A © a x ) = 2~ X2 l B © T (tr A (2~ Al l A ®a x )) 



= 2- X2 l B ®T {(J X ) = 2~ X2 l B ®p Y 
By Prop. 11, we eventually have 

2~ Al l A ® a x > w 2~ X2 1 B © p Y ■ 



□ 



Remark 16. A trace-nonincreasing, 2~ x -subunital com- 
pletely positive map Tx^Y can always be written as in 
Eq. (A4) for a sub-unital trace-nonicreasing completely 
positive map £ A x^by, which itself can always be writ- 
ten as projections of a unital map £cz^cz (see text of 
the previous proof, and Prop. 12). 

Conversely, for any unital map £cz^cz with 
£ (2 Xl I © o~x) — 2~ A2 1 (g> py, in particular for any 
noisy operation in our framework, the map T obtained 
by Eq. (A4) is trace-nonincreasing and 2~ A -subunital. 

In particular, for our purposes of optimizing A over all 
possible processes of our framework with an additional 
condition to the channel carrying out the process (namely 
to preserve correlations between our system X and the 
reference system R), we may impose that condition di- 
rectly on the channel T to obtain an upper bound on 
A. 



3. Properties for quantum states 

We will consider in this section some useful properties 
of lambda-majorization in the case where we consider 
normalized states a, p. Here, weak majorization auto- 
matically implies (regular) majorization because tr a = 
tr p = 1. 
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In this section, let a £ y = (Jf x ) and p € ^L(^r). 

Proposition 17 (Lambda-Majorizing a Pure State). 

for ari!/ pure state |0) G we have a \ |0)(0| «/ and 

cmfo/ i/ ranker 2 _A (obviously A /ias to fee negative or 
zero). Equivalently, ay- i/ and on/?/ if ranker ^ n. 

Proof of Prop. 17. Assume first that cr |0)(0|. Here 
is the one-dimensional space spanned by |0), and 
take J#x the subspace on which cr has its support. By 
Prop. 9 there exists a single-row matrix T i k satisfying 
T, k > 0, J2 t T t k = Tj=i < 1 Vfc, E fc ^ fc < 2~ A such 
that 1 = A i= i(|0)(0|) = J2k T i =i X k(cr). We also have 
Afe(cr) ^ because cr has nonzero eigenvalues in J#x- 
Then E^iA^o-) = 1 = £ fc A,(cr) implies ^ = 
1 Vfc. That is, the condition £ fe T> x 2~ A forces T t t x 
to have at most 2 _A elements, i.e. the rank of cr may not 
exceed 2 _A . 

The converse holds because any state majorizes a uni- 
form state of the same rank. □ 

Proposition 18 (Condition on Support Sizes for Lamb- 
da-Majorization) . If a p, then ranker 2~ A rankp. 

Proof of Prop. 18. Notice that p >- j^^lrankp, and 
thus cr A I ^j^lrankp- Then, by Prop. 8 we have 



A— log rank p 
cr > 



minimal amount of randomness that you have to gener- 
ate, in a noisy operation process: 



R(a — > p) = sup {A : a ^> p } 



(A6) 



Recent work has shown that this measure is relevant 
for the amount of extractable work of processes acting on 
arrays of Szilard boxes [19]. 

The absorbed randomness has some tight relations 
to single-shot entropy measures, which we present here. 
These are reformulations of results shown in [17, 18]. 

Proposition 20. The absorbed randomness defined 
above satisfies the following bounds. 

H min (p) - H Q {a) < R(a -> p) < H (p) - H (a) . 

Proposition 21. If |0) denotes any pure state, then the 
following relations hold: 



R(\0) -> p) = H min (p) , 
fl(ff-H0)) = -Ho(ct) . 



(A7) 
(A8) 



it remains to apply Prop. 17. 



□ 



Similar explicit values can be obtained in the case 
where either the initial state or the target state is mixed. 

Proposition 22. If ^ denotes the fully mixed state on 
logn qubits, then: 

R(^^p)=H xnin (p)-\ogn , (A9) 
R(a^^)=logn-H (a) . (A10) 



Proposition 19 (Being Lambda-Majorized by a Pure 
State). Let the state p have maximum eigenvalue 

Amax(p)- For any pure state |0), we have |0)(0| A- p 
if and only if A max (p) ^ 2~ A . Equivalently, >- p if 

and only if \ max (p) < 

Proof of Prop. 19. Let T i k be as in Prop. 9. Note here 
k only takes value 1, because we consider J^y being the 
one-dimensional space spanned by |0). Then Xi(p) — 
Efc^ fc A fe (|0)(0|) = and thus = X t (p). Then 

2~ A > J2k T i k = T i =1 = for a11 In particular, 

2~ A > A max (p). 

Conversely, if A max (p) < 2~ A , then let T^ 1 = X l {p). 
This matrix T satisfies the conditions in Prop. 9 and thus 

|0)(0| 4 p. □ 



4. Optimal Lambda Majorization for Normalized 
States and Relation to Single-Shot Entropy 
Measures 

Define the absorbed randomness (or relative mixed- 
ness [19]) of a transition from a to p as the maximal 
amount of randomness that you can get rid of, or the 



Proof of Prop. 20. Lower bound: Let Ai = H m - ln (p) = 
— logA max (yo) and A2 = Hq(<t) = log ranker. By Propo- 
sition 19, we have 2~ Al l 2 A! >- p and by Proposition 17, 
a >~ 2 _A2 1 2 a 2 . The majorization carries over to the ten- 
sor product, 2~ Al l 2 *i ® cr >- 2~ A2 1 2 a 2 ® p, and Ai — A 2 
is a valid maximization candidate for (A6). 

Upper bound: Let A = R(a — > p) satisfying cr — > p. 
Proposition 18 immediately yields 2 A ^ r ^°, p i and 

R(a — > p) = A s; log rank p — log rank a . 

Recalling the definition of the Renyi-0 entropy Ho(cr) = 
log rank a yields the required upper bound. □ 



Proof of Prop. 21. Equation (A8) follows from the 
bounds of Proposition 20, which become tight in this 
special case. Equality (A7) is a direct consequence of 
Prop. 19. □ 



Proof of Prop. 22. The bounds of Proposition 20 become 
tight for (A10). Equality (A9) is again a consequence 
of Prop. 19, recalling Prop. 8 which allows us to write 



A+log n 



> p instead of — > p. 



□ 
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Appendix B: Derivation of the Main Result: 
Formulation as Semidefinite Program 

Let be a quantum system in the state a x - Let 
J#r be an additional quantum system and let \o)xr be 
a purification of ax- 

Suppose we want to bring the system X into a given 
state Pxr with a lambda-majorization (here Pxr is not 
necessarily pure; giving the joint state with R allows us 
to specify which correlations we want to preserve). The 
task is then the following. 

Task. Find the best (maximal) A, such that there exists 
a completely positive, 2~ A -subunital, trace-nonincreasing 
map Tx^X' satisfying Tx^X'{o~xr) = Px'R- 

In other words, we would like to find the trace non- 
increasing channel that satisfies 7x->X' (oxi?) = Px'R, 
that has the smallest possible ||7x->X' (lx)||oo- 

This problem can be formulated as a semidefinite pro- 
gram in terms of the variables a (defined as a = 2~ A ) 
and Tx^X' (through its Choi-Jamiolkowski map Txx')- 
(See [60, 61] for a introduction to SDPs in a style similar 
to what we use here.) 

Primal 



: ujx' 
X x 

'■ Zx'R 



minimize: a 
subject to: 

Tx^x' (lx) < al X ' 
T x ^ x , (1x0 *S lx 

Tx-^X'i^XB.) = PX'R ■ 

Dual 

maximize : 

tr (Z X 'R Px'r) - tr Ax 

subject to: 

ttUJx' ^ 1 

tv R [a% Z X , R ] Ox ® UJ X ' +X X ® l x , 



(Bla) 
(Bib) 
(Blc) 



(B2a) 
(B2b) 



Note that since the channel does not touch er^, we must 
necessarily have an — pR. Let E be an environment that 
purifies the output state as px'RE- As two purifications 
with the same reduced state on R, the two states axR 
and px'R must be related by an isometry Vx-tX'E as 
PX'RE = Vx^X'E <?xr V f . We can choose V X -yX'E to be 
a partial isometry such that VV^ = llx'E, the projector 
on the support of px'Et and V^V = IIx, the projector 
on the support of a x . 

Now, define T by its Stinespring dilation 



Tx^x' (■) - tr £ [V x ^x'E (•) V^] 



(B3) 



and let a = ||7~(lx)||oo- We will show that this choice of 
variables is feasible and optimal, and will derive a more 
explicit value of a. 



Condition (Bla) is satisfied by definition and (Bib) 
because V is a partial isometry. Also, verifying condi- 
tion (Blc), 

Tx^x' {o~xr) = trg \Vx->x' o~xrV^] = tr E p X 'RE 

= PX , R . (B4) 

Now calculate 



a = ||T(lx)||co = Utrjj VV+Hoo = ||tr £ n A -, B | 
= maxtr [fix's rx'l = 2 Ho(Elx " ) " 



(B5) 



We will now show that this value is optimal by exhibit- 
ing a solution to the dual problem that achieves the same 
value. Let u>x' = T X' be the optimal tx> for the defini- 
tion of H (E\X r ) as in (B5), let Z x >r = a^ 1 ®wjc and 
let X x = 0. This choice is feasible since condition (B2a) 
is automatically satisfied and condition (B2b) becomes 



tx R [o*£ R Z X 'r] = ix R [a^ R ■ p^ 



tr_R 



X\R 



' 0J X > 



n 



1 WX' ^ lx ® Ux' 



(B6) 



where <&x\r is a maximally entangled state on the sup- 
ports of a x and aR. Let px'RE and Vx-»X'E be defined 
as before. The value achieved by this choice of dual vari- 
ables is then 



tr [Z X 'R Px'r] = tr [a R x ® uj x > ■ Px'r] 
= tr [a R x u x > ■ Vx^X'E ctxrV 1 *] 
= tr [u x , ■ V X ->x'E ®x\rV^] 



tr cjx'Bx' 



2H (E\X') P 



(B7) 
(B8) 

(B9) 



From this, we conclude that the optimal A for this 
problem is 



A op t — —Hq{E\X) p . 



(BIO) 



where px'RE is a purification of px'R- 

We note also that this gives the optimal amount of ex- 
tracted work. Of course, any A ^ A op t also is a solution. 



Appendix C: Renyi-zero entropy of the W state 

Let S and M be two qubits in the state psu = 
i|00)(00| SM + f |* + )(*+| (where |*+) is the Bell state 
|*+) = ^[|01) + 1 10>] ). Written out explicitely in the 
basis {|0),|1)}, 



1/3 



PSM = 



1/3 V 3 
1/3 I/ 3 
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(Empty entries are zero.) Under the constraint ^ Si ^ 1, this expression is clearly 

The projector on its support is maximized when si — 1, yielding the value 



n 



SM 




We would like to compute the quantity 

2 H (S\M) P = max tT U S M<J M 
<7m dens. op. 

Let (7m=(:; 5 then 



tr [n SM (l s O cr M )] = si + * (1 - si) + ^si 



ff (S|M) p = log- . 



= 2+^i- (CI) 
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